Check with smaller windows
Overview: Script for water salinity data from water stations. Sources and information on data below. First I remove data points that are flagged as failed and suspect. Then I use a rolling mean to remove data points +/- 2 standard deviations from the 2hr rolling mean. Finally I calculate hourly mean since data was collected at different frequencies. This will set me up to do analysis between sites and with field data. Sites graphed from cloest to the delta to the mouth of the bay.
Notes on data:
-Accessed and downloaded all data on September 30, 2020 -Downloaded all data available from 1/1/2017-12/31/2019. -China camp http://cdmo.baruch.sc.edu/aqs/zips.cfm Data logged every 15 minutes. Requested citation format: NOAA National Estuarine Research Reserve System (NERRS). System-wide Monitoring Program. Data accessed from the NOAA NERRS Centralized Data Management Office website: http://cdmo.baruch.sc.edu/; accessed 30 September 2020. -Richardson Bay/NERR data. Data logged every 15minutes. -EOS https://oceanview.pfeg.noaa.gov/erddap/tabledap/rtcctdRTCysi.html. Data logged every 6 minutes. time in UTC. Should I use this link? http://erddap.cencoos.org/erddap/tabledap/tiburon-water-tibc1.html. ph and salinity available. No air temperature. -Point Fort http://erddap.cencoos.org/erddap/tabledap/fort-point.html?time%2Csea_water_temperature%2Csea_water_temperature_qc_agg%2Cz&time%3E%3D2020-09-20T21%3A07%3A41Z&time%3C%3D2020-09-29T12%3A00%3A00Z. time in UTC. Data logged every 6 minutes.
Next steps: -fix the aesthetics of the graphs. Font too small, change gradient, clean up keys, ect -analysis with field data -site characterization summary table
Set up
rm(list=ls())
library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes)
library(cranlogs)
library(knitr)
library(openair)
Make a custon function to calculate mean, standard deviation (sd), 95% confidence interval where x is a numeric vector. Here “hi” and “lo” are defined as 2 strandard deviations away from mean. This creates a functin so you don’t need to change anything here/this step does not manipulate the data. You can change the equation to define the hi and low as 3 or however many sds away you would like. na.rm=TRUE removes any “NA” values. “ret” are the new columns it’ll add when you run this on a df.
custom_stat_fun_2<-function(x, na.rm=TRUE){
m<-mean(x, na.rm=TRUE)
s<- sd(x, na.rm=TRUE)
hi<-m+2*s
lo<-m-2*s
ret<-c(mean=m, stdev=s, hi.95=hi, lo.95=lo)
}
Salinity
China Camp
Read in data. Subset (not needed but it’s a large df). Needs a column named “date” with no time stamp for step later.
cc<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/china_camp_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))
cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))
cc<-subset(cc, select=c(date, datetime, Sal, F_Sal))
cc%>%
ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from China Camp: All data points",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).
Remove data that failed QC test
cc<-cc[- grep("-3", cc$F_Sal),]
cc%>%
ggplot(aes(x=date, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from China Camp: Failed QC points removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).
Still some points I’m not sure about so I’m going to remove data points flagged as “suspect” too
cc<-cc[- grep("1", cc$F_Sal),]
cc%>%
ggplot(aes(x=date, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from China Camp: Failed & suspect QC points removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).
I like this much more.
Roll apply using custom stat function. Needs a column named “date” with no time stamp in order to work. Select = what paramter/column you want to apply the rolling average to, in this case we’re interested in salinity. FUN= is the custom function created at the beginning. Width of window depends on frequencey of data collection. Data collected every 15 min so width = 672 is 7 days, 1344 is 14 days, 2880 is 30 days. 96 observations per day. 8=2hours.
2 hour
cc<-cc%>%
tq_mutate(
select= Sal,
mutate_fun= rollapply,
width= 8,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
Filter to remove calues that are +/- 2 standard deviations from the rolling mean
cc<-filter(cc, Sal <hi.95 & Sal >lo.95)
cc%>%
ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from China Camp: Failed QC points removed + +/- 2sd removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 7 rows containing missing values (geom_point).
Hourly median
For timeAverage function to work, you need a date or POSIXct column named “date” but it needs the timestamp to work so we’ll add it back in. Interval = frequency of data collection.
cc<-subset(cc, select=-c(date))
cc$date<-as.POSIXct(cc$datetime, format = c("%m/%d/%Y %H:%M"))
cc.1hr.med<- timeAverage(cc,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc_sal<-cc.1hr.med %>% ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Hourly median salinity from China Camp",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
cc_sal
## Warning: Removed 4563 rows containing missing values (geom_point).
EOS pier
Set up
eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))
eos$datetime<-as.POSIXct(eos$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))
eos<-subset(eos, select=c(date, datetime, salinity))
eos%>%
ggplot(aes(x=datetime, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Plot with better view
eos%>%
ggplot(aes(x=datetime, y=salinity, color=salinity))+
geom_point(alpha=0.5)+ylim(0,35)+
labs(title="Salinity data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
## Warning: Removed 2 rows containing missing values (geom_point).
Since this data set doesn’t have a QC test column, I’ll remove values that are far outside expected salinity values in Tiburon.
eos<-filter(eos, salinity >0 & salinity <40)
eos%>%
ggplot(aes(x=datetime, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from EOS resesarch pier: Values outside of expected range removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Not sure about the pocket of lower salinity values at the end of 2018. Need to verify. A few other points stand out but they should be removed with the next steps.
Roll apply using custom stat function. Data collected every 6 minutes so width=1680 is 7 days, 3360 is 14 days, 7200 is 30 days. 240 observations per day. 20=2 hours. Needs a column named “date” with no time stamp in order to work.
2hr
eos<-eos%>%
tq_mutate(
select= salinity,
mutate_fun= rollapply,
width= 20,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
Filter and graph
eos<-filter(eos, salinity <hi.95 & salinity >lo.95)
ggplot(data = eos, mapping = aes(x = datetime, y = salinity, color=salinity)) + geom_point()+
labs(title="Salinity data from EOS resesarch pier: Extreme + +/- 2sd data removed ",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Looks better. Need to verify that low salinity pocket at the end of 2018.
Hourly median
eos<-subset(eos, select=-c(date))
names(eos)[names(eos) == "datetime"] <- "date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))
eos.1hr.med<- timeAverage(eos,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos_sal<-eos.1hr.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Hourly median salinity from EOS resesarch pier",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
eos_sal
## Warning: Removed 314 rows containing missing values (geom_point).
Richardson Bay
Set up
rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))
rb<-subset(rb, select=c(date, datetime, Sal, F_Sal))
rb%>%
ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Richardson Bay: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).
Remove flagged data that did not pass QAQC. F_sal with a -3 indicates fail
rb<-rb[- grep("-3", rb$F_Sal),]
rb%>%
ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Richardson Bay: Failed QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).
Not sure about that low salinity pocket at the end of 2017 so I’m going to remove data points flagged as “suspect” and see if that helps.
rb<-rb[- grep("1", rb$F_Sal),]
rb%>%
ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Richardson Bay: Failed & suspect QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).
Looks better and got rid of most of the event that I wasn’t sure about. Still some points that I dont’ like. Hopefully the next step helps.
Roll apply using custom stat function. Data collected every 15 min so width = 672 is 7 days, 1344 is 14 days, 2880 is 30 days. 96 observations per day. 2hrs=8
rb<-rb%>%
tq_mutate(
select= Sal,
mutate_fun= rollapply,
width= 8,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
Filter
rb<-filter(rb, Sal <hi.95 & Sal >lo.95)
rb%>%
ggplot(aes(x=datetime, y=Sal, color=Sal))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Richardson Bay: Failed & suspect QC data + -/+2sd removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 11 rows containing missing values (geom_point).
Hourly median
rb<-subset(rb, select=-c(date))
rb$date<-as.POSIXct(rb$datetime, format = c("%m/%d/%y %H:%M"))
rb<-subset(rb, select=-c(datetime))
rb.1hr.med<- timeAverage(rb,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 11 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb_sal<-rb%>%
ggplot(aes(x=date, y=Sal, color=Sal))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Hourly median salinity data from Richardson Bay",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
rb_sal
## Warning: Removed 11 rows containing missing values (geom_point).
Fort Point
Set up. For some reason I was getting an error later when I converted the date and datetime column to date columns so I’m only doing one here unlike the other sites but I’m conver the other ones later. The method is still consistent with the other sites.
fp<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fort_point_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
fp$date<-as.Date(fp$date, format = c("%Y-%m-%d"))
names(fp)[names(fp) == "sea_water_practical_salinity"] <- "salinity"
names(fp)[names(fp) == "sea_water_practical_salinity_qc_agg"] <- "F_Sal"
fp<-subset(fp, select=c(datetime, date, salinity, F_Sal))
fp%>%
ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Fort Point: All Data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove flagged data, 4 indicates fail.
fp<-filter(fp, F_Sal!=4)
fp%>%
ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Fort Point: Failed QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Still a lot of points I don’t like. 2 hour window did not remove enough outliers. I’m going to also remove points that are flagged as “suspect” and see if it helps.
fp<-filter(fp, F_Sal!=3)
fp%>%
ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Fort Point: Failed & suspect QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
This already looks much better. It still has those points around 0 which just do not look right to me so I’m going to pick them out for now but I’d like a cleaner method if possible. These points weren’t removed with the 2hr window and I’d rather pull them out here so I dont’ scew the next few steps.
fp<-filter(fp, salinity >1)
fp%>%
ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Must better. Still two points I don’t like but should be removed in the next step.
Roll apply using custom stat function. Data collected every 6 minutes so width=1680 is 7 days, 3360 is 14 days, 7200 is 30 days. 240 observations per day. 2hrs = 20 (doesn’t remove enough).
This is where I was getting an error when I had both the date and datetime column converted so I just converted the date column since that’s what used at this step and will convert the other one when I use it. It didn’t do this before so I’m not sure what’s changed.
rm(fp.2hr.window)
## Warning in rm(fp.2hr.window): object 'fp.2hr.window' not found
fp<-fp%>%
tq_mutate(
select= salinity,
mutate_fun= rollapply,
width= 20,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
fp<-filter(fp, salinity <hi.95 & salinity >lo.95)
fp%>%
ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Hourly median
fp<-subset(fp, select=-c(date))
fp$date<-as.POSIXct(fp$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))
fp.1hr.med<- timeAverage(fp,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 27 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp_sal<-ggplot(data = fp.1hr.med, mapping = aes(x = date, y = salinity, color=salinity)) + geom_point()+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Hourly median salinity data from Fort Point",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
fp_sal
## Warning: Removed 3125 rows containing missing values (geom_point).
Salinity graphs together
all_sal<-ggarrange(cc_sal, eos_sal, rb_sal, fp_sal, nrow=4, ncol=1)
## Warning: Removed 4563 rows containing missing values (geom_point).
## Warning: Removed 314 rows containing missing values (geom_point).
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 3125 rows containing missing values (geom_point).
all_sal
Still need to verify a few events but overall this is looking good.
Site characteristics
How many days had at least one hourly median reading of salinity below 10?
Using hourly median data
China Camp
If salinity is less than 10 (lt10) = “low”. Make “day” column, subset to only get rows with “low” salinity events, remove duplicated days to only get the number of days
cc.1hr.med$lt10<-ifelse(cc.1hr.med$Sal <10, "low", "no")
cc.1hr.med$date<-as.Date(cc.1hr.med$date, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
cc.low.event<-subset(cc.1hr.med, cc.1hr.med$lt10 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
191
Tiburon
names(eos.1hr.med)[names(eos.1hr.med) == "date"] <- "datetime"
eos.1hr.med$date<-as.Date(eos.1hr.med$datetime, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
eos.1hr.med$lt10<-ifelse(eos.1hr.med$salinity <10, "low", "no")
eos.low.event<-subset(eos.1hr.med, eos.1hr.med$lt10 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
180
Richardson Bay
names(rb.1hr.med)[names(rb.1hr.med) == "date"] <- "datetime"
rb.1hr.med$date<-as.Date(rb.1hr.med$datetime, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
rb.1hr.med$lt10<-ifelse(rb.1hr.med$Sal <10, "low", "no")
rb.low.event<-subset(rb.1hr.med, rb.1hr.med$lt10 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
19
Fort Point
names(fp.1hr.med)[names(fp.1hr.med) == "date"] <- "datetime"
fp.1hr.med$date<-as.Date(fp.1hr.med$datetime, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
fp.1hr.med$lt10<-ifelse(fp.1hr.med$salinity <10, "low", "no")
fp.low.event<-subset(fp.1hr.med, fp.1hr.med$lt10 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
27
How many days had at least one hourly median reading of salinity below 10:
China Camp 191
EOS 180
Richardson Bay 19
Fort Point 27
How many days had at least one hourly median reading of salinity below 20?
Using hourly median data
China Camp
cc.1hr.med$lt20<-ifelse(cc.1hr.med$Sal <20, "low", "no")
cc.low.event<-subset(cc.1hr.med, cc.1hr.med$lt20 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
493
Tiburon
eos.1hr.med$lt20<-ifelse(eos.1hr.med$salinity <20, "low", "no")
eos.low.event<-subset(eos.1hr.med, eos.1hr.med$lt20 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
400
Richardson Bay
rb.1hr.med$lt20<-ifelse(rb.1hr.med$Sal <20, "low", "no")
rb.low.event<-subset(rb.1hr.med, rb.1hr.med$lt20 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
154
Fort Point
fp.1hr.med$lt20<-ifelse(fp.1hr.med$salinity <20, "low", "no")
fp.low.event<-subset(fp.1hr.med, fp.1hr.med$lt20 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
173
How many days had at least one hourly median reading of salinity below 20:
China Camp 493
EOS 400
Richardson Bay 154
Fort Point 173
How many days with a daily median below salinity 10?
Using the data with the outliers removed (flagged and +/-2sd) and then calculate daily median. Need a column named “date” with a timestamp for this to work. Using “day” as the measurement instead of 24 hours
China Camp
cc.daily.med<- timeAverage(cc,
avg.time="1 day",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.daily.med$lt10<-ifelse(cc.daily.med$Sal <10, "low", "no")
cc.low.event<-subset(cc.daily.med, cc.daily.med$lt10 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
153
EOS
eos.daily.med<- timeAverage(eos,
avg.time="1 day",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.daily.med$lt10<-ifelse(eos.daily.med$salinity <10, "low", "no")
eos.low.event<-subset(eos.daily.med, eos.daily.med$lt10 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
69
Richardson Bay
rb.daily.med<- timeAverage(rb,
avg.time="1 day",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 11 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb.daily.med$lt10<-ifelse(rb.daily.med$Sal <10, "low", "no")
rb.low.event<-subset(rb.daily.med, rb.daily.med$lt10 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
12
Fort Point
fp.daily.med<- timeAverage(fp,
avg.time="1 day",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 27 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp.daily.med$lt10<-ifelse(fp.daily.med$salinity <10, "low", "no")
fp.low.event<-subset(fp.daily.med, fp.daily.med$lt10 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
9
How many days had a daily median salinity below 10:
China Camp 153
EOS 69
Richardson Bay 12
Fort Point 9
How many days with a daily median below salinity 20?
Using the data with the outliers removed (flagged and +/-2sd) and then calculate daily median. Need a column named “date” with a timestamp for this to work.
China Camp
cc.daily.med$lt20<-ifelse(cc.daily.med$Sal <20, "low", "no")
cc.low.event<-subset(cc.daily.med, cc.daily.med$lt20 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
375
EOS
eos.daily.med$lt20<-ifelse(eos.daily.med$salinity <20, "low", "no")
eos.low.event<-subset(eos.daily.med, eos.daily.med$lt20 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
258
Richardson Bay
rb.daily.med$lt20<-ifelse(rb.daily.med$Sal <20, "low", "no")
rb.low.event<-subset(rb.daily.med, rb.daily.med$lt20 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
142
Fort Point
fp.daily.med$lt20<-ifelse(fp.daily.med$salinity <20, "low", "no")
fp.low.event<-subset(fp.daily.med, fp.daily.med$lt20 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
60
How many days had a daily median salinity below 20:
China Camp 426
EOS 258
Richardson Bay 142
Fort Point 60
All together
Will need to format better eventually but this is just a data dive.
How many days had at least one hourly median reading of salinity below 10:
China Camp 191
EOS 180
Richardson Bay 19
Fort Point 27
How many days had at least one hourly median reading of salinity below 20:
China Camp 493
EOS 400
Richardson Bay 154
Fort Point 173
How many days had a daily median salinity below 10:
China Camp 153
EOS 69
Richardson Bay 12
Fort Point 9
How many days had a daily median salinity below 20:
China Camp 426
EOS 258
Richardson Bay 142
Fort Point 60
Interesting that you see the idea of Richardson Bay being a salinity refudge when looking at hourly median data but not with the daily median data Possibly the influence of the the open ocean dominates the daily median at Fort Point. Certainly still something interesting happening at Richardson Bay. I’ll play around with the time window and see what happens. Let’s look into 12hr median and 4hr median.
4 hour median low events How many four hour medians below salinity of 10?
#China Camp: 810
cc.4hr.med<- timeAverage(cc,
avg.time="4 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.4hr.med$lt10<-ifelse(cc.4hr.med$Sal <10, "low", "no")
cc.low.event<-subset(cc.4hr.med, cc.4hr.med$lt10 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
#EOS:510
eos.4hr.med<- timeAverage(eos,
avg.time="4 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.4hr.med$lt10<-ifelse(eos.4hr.med$salinity <10, "low", "no")
eos.low.event<-subset(eos.4hr.med, eos.4hr.med$lt10 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
#Richardson Bay: 68
rb.4hr.med<- timeAverage(rb,
avg.time="4 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 11 lines
## Warning: Detected data with Daylight Saving Time, converting to UTC/GMT
rb.4hr.med$lt10<-ifelse(rb.4hr.med$Sal <10, "low", "no")
rb.low.event<-subset(rb.4hr.med, rb.4hr.med$lt10 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
#Fort Point: 57
fp.4hr.med<- timeAverage(fp,
avg.time="4 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 27 lines
## Warning: Detected data with Daylight Saving Time, converting to UTC/GMT
fp.4hr.med$lt10<-ifelse(fp.4hr.med$salinity <10, "low", "no")
fp.low.event<-subset(fp.4hr.med, fp.4hr.med$lt10 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
How many four hour medians below salinity of 20?
#China Camp: 2196
cc.4hr.med$lt20<-ifelse(cc.4hr.med$Sal <20, "low", "no")
cc.low.event<-subset(cc.4hr.med, cc.4hr.med$lt20 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
#EOS: 1449
eos.4hr.med$lt20<-ifelse(eos.4hr.med$salinity <20, "low", "no")
eos.low.event<-subset(eos.4hr.med, eos.4hr.med$lt20 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
#Richardson Bay: 811
rb.4hr.med$lt20<-ifelse(rb.4hr.med$Sal <20, "low", "no")
rb.low.event<-subset(rb.4hr.med, rb.4hr.med$lt20 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
#Fort Point: 429
fp.4hr.med$lt20<-ifelse(fp.4hr.med$salinity <20, "low", "no")
fp.low.event<-subset(fp.4hr.med, fp.4hr.med$lt20 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
12 hour median low events How many 12 hour medians below salinity of 10?
#China Camp: 303
cc.12hr.med<- timeAverage(cc,
avg.time="12 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.12hr.med$lt10<-ifelse(cc.12hr.med$Sal <10, "low", "no")
cc.low.event<-subset(cc.12hr.med, cc.12hr.med$lt10 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
#EOS:147
eos.12hr.med<- timeAverage(eos,
avg.time="12 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.12hr.med$lt10<-ifelse(eos.12hr.med$salinity <10, "low", "no")
eos.low.event<-subset(eos.12hr.med, eos.12hr.med$lt10 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
#Richardson Bay: 23
rb.12hr.med<- timeAverage(rb,
avg.time="12 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 11 lines
## Warning: Detected data with Daylight Saving Time, converting to UTC/GMT
rb.12hr.med$lt10<-ifelse(rb.12hr.med$Sal <10, "low", "no")
rb.low.event<-subset(rb.12hr.med, rb.12hr.med$lt10 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
#Fort Point: 17
fp.12hr.med<- timeAverage(fp,
avg.time="12 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 27 lines
## Warning: Detected data with Daylight Saving Time, converting to UTC/GMT
fp.12hr.med$lt10<-ifelse(fp.12hr.med$salinity <10, "low", "no")
fp.low.event<-subset(fp.12hr.med, fp.12hr.med$lt10 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
How many four hour medians below salinity of 20?
#China Camp: 741
cc.12hr.med$lt20<-ifelse(cc.12hr.med$Sal <20, "low", "no")
cc.low.event<-subset(cc.12hr.med, cc.12hr.med$lt20 == "low")
cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]
#EOS: 515
eos.12hr.med$lt20<-ifelse(eos.12hr.med$salinity <20, "low", "no")
eos.low.event<-subset(eos.12hr.med, eos.12hr.med$lt20 == "low")
eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]
#Richardson Bay: 280
rb.12hr.med$lt20<-ifelse(rb.12hr.med$Sal <20, "low", "no")
rb.low.event<-subset(rb.12hr.med, rb.12hr.med$lt20 == "low")
rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]
#Fort Point: 129
fp.12hr.med$lt20<-ifelse(fp.12hr.med$salinity <20, "low", "no")
fp.low.event<-subset(fp.12hr.med, fp.12hr.med$lt20 == "low")
fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]
Daily Median
Graphing out of curiousity
a<-cc.daily.med %>% ggplot(aes(x=date, y=Sal, color=Sal))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Daily median salinity from China Camp",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
b<-eos.daily.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Daily median salinity from EOS resesarch pier",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
c<-rb.daily.med %>% ggplot(aes(x=date, y=Sal, color=Sal))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Daily median salinity from Richardson Bay",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
d<-fp.daily.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
geom_point(alpha=0.5)+ylim(0,35)+
geom_hline(yintercept=10, linetype="dashed", color = "red")+
geom_hline(yintercept=20, linetype="dashed", color = "red")+
labs(title="Daily median salinity from Fort Point",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
ggarrange(a,b,c,d, nrow=4, ncol=1)
## Warning: Removed 164 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 200 rows containing missing values (geom_point).
## Warning: Removed 80 rows containing missing values (geom_point).
Not sure this is informative but I was curious.
Duration of low salinity events
Really not sure how to do this. Now how can I see consecuative days? Like how many events <10 with at least 2days in a row? https://predictivehacks.com/count-the-consecutive-events-in-r/
Low salinity events and fucus density
Number of hourly medians below salinity 10 and density of fucus. The x.daily.med has lt10 and lt20 columns that I can subset for this. Also need the field data. I’ll be comparing the field data to the closest environnmental data station:
China Camp -> Paradise Cay EOS -> Point Chauncy
Richardson Bay -> Brickyard Park
Fort Point -> Horseshoe Bay
I’ll look at the environmental data between field survey dates.
Field data. For most lm package to work, the variables need to be the same length. I’m going to add the density data to the salinity data which will introduce a lot of “NA”s to the density column but that’s okay. I’m going to add an arbitrary survey time so that the survey data is only added once per day to the salinity data. You need a column in the same name and format to merge dfs so that’s why I’m creating a survey “datetime”. I have two options to merge: 1) add the same survey time to all survey dates which means I’ll have to aggregate the field data to a monthly mean or 2) add a sequencial survey time 1:00:00 - 10:00:00 so that I can keep all of the field data in the linear model. I don’t know how to do the second option so for now I will be comparing monthly means.
#read in field data
field<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Final data/CB_field_data_10.12.2020.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
#formatting field data to prep for merging
field$date<-as.Date(field$date, format = c("%m/%d/%Y"))
field$time<-"01:00:00"
field$datetime = paste(field$date,field$time)
field$datetime <- as.POSIXct(field$datetime, format = c("%Y-%m-%d %H:%M:%S"))
#Subsetting by site
pc<-subset(field, field$site.old == "PC")
nd<-subset(field, field$site.old == "ND")
by<-subset(field, field$site.old == "BY")
hs<-subset(field, field$site.old == "HS")
#monthly mean of fucus density
pc.mon.den<-aggregate(no.fuc.q ~ datetime, pc, mean)
nd.mon.den<-aggregate(no.fuc.q ~ datetime, nd, mean)
by.mon.den<-aggregate(no.fuc.q ~ datetime, by, mean)
hs.mon.den<-aggregate(no.fuc.q ~ datetime, hs, mean)
#merging salinity data with fucus data
cc.pc.mon.den<-merge(cc.1hr.med, pc.mon.den, by="datetime", all.x=TRUE, all.y=TRUE)
eos.nd.mon.den<-merge(eos.1hr.med, nd.mon.den, by="datetime", all.x=TRUE, all.y=TRUE)
rb.by.mon.den<-merge(rb.1hr.med, by.mon.den, by="datetime", all.x=TRUE, all.y=TRUE)
fp.hs.mon.den<-merge(fp.1hr.med, hs.mon.den, by="datetime", all.x=TRUE, all.y=TRUE)
Getting stuck here. Not sure if I should be doing monthly density or density at the quadrat level. Still struggling to incorporate a “lag” or date term, ie. looking at salinity inbetween field dates.